由于深度神经网络的开发,尤其是对于最近开发的无监督的JND代模型,对公正的显着差异(JND)建模做出了重大改进。但是,他们有一个主要的缺点,即在现实世界信号域而不是在人脑中的感知结构域中评估了生成的JND。当在这两个域中评估JND时,存在明显的差异,因为在现实世界中的视觉信号在通过人类视觉系统(HVS)传递到大脑之前已编码。因此,我们提出了一个受HVS启发的信号降解网络进行JND估计。为了实现这一目标,我们仔细分析了JND主观观察中的HVS感知过程,以获得相关的见解,然后设计受HVS启发的信号降解(HVS-SD)网络,以表示HVS中的信号降解。一方面,知识渊博的HVS-SD使我们能够评估感知域中的JND。另一方面,它提供了更准确的先验信息,以更好地指导JND生成。此外,考虑到合理的JND不应导致视觉注意力转移的要求,提出了视觉注意力丧失以控制JND的生成。实验结果表明,所提出的方法实现了SOTA性能,以准确估计HVS的冗余性。源代码将在https://github.com/jianjin008/hvs-sd-jnd上找到。
translated by 谷歌翻译
最近,由于引入变压器,时间序列的性能最近得到了极大的改善。在本文中,我们提出了一个一般的多尺度框架,可以应用于基于最新的变压器的时间序列预测模型,包括自动构造和告密者。使用具有共同权重,体系结构适应和专门设计的归一化方案的多个尺度上的预测时间序列,我们能够通过最小的其他计算开销来实现重大的性能改进。通过详细的消融研究,我们证明了我们提出的建筑和方法论创新的有效性。此外,我们在四个公共数据集上的实验表明,所提出的多规模框架的表现优于相应的基线,平均改善比自动型和告密者的平均改善分别为13%和38%。
translated by 谷歌翻译
最近,已经研究了各种视图合成失真估计模型以更好地为3-D视频编码服务。然而,它们可以在不同水平的深度变化,纹理变性和视图合成失真(VSD)中数量地定量地模拟关系,这对于速率失真优化和速率分配至关重要。在本文中,开发了一种基于自动加权层表示的视图合成失真估计模型。首先,根据深度变化和它们相关的纹理变性,定义子VSD(S-VSD)。之后,一组理论衍生证明VSD可以大致分解成乘以其相关权重的S-VSD。为了获得S-VSD,开发了一种基于层的S-VSD表示,其中具有相同深度变化级别的所有像素用层表示,以在层级别实现高效的S-VSD计算。同时,学习非线性映射函数以准确地表示VSD和S-VSD之间的关系,在VSD估计期间自动为S-VSD提供权重。要了解此类功能,构建了VSD的数据集及其关联的S-VSD。实验结果表明,在其相关的S-VSD可用后,可以通过由非线性映射函数的重量进行准确地估计VSD。所提出的方法以准确性和效率优于相关的最先进方法。该方法的数据集和源代码将在https://github.com/jianjin008/处提供。
translated by 谷歌翻译
随着事物(AIOT)的发展,在我们的日常工作和生活中产生了大量的视觉数据,例如图像和视频。这些视觉数据不仅用于人类观察或理解,而且用于机器分析或决策,例如智能监控,自动化车辆和许多其他智能城市应用。为此,在这项工作中提出了一种用于人机和机器使用的新图像编解码器范例。首先,利用神经网络提取高级实例分割图和低级信号特征。然后,实例分割图还被表示为具有所提出的16位灰度表示的简档。之后,两个16位灰度曲线和信号特征都以无损编解码器编码。同时,设计和培训图像预测器以实现具有16位灰度曲线简曲和信号特征的一般质量图像重建。最后,使用用于高质量图像重建的有损编解码器来压缩原始图像和预测的剩余地图。通过这种设计,一方面,我们可以实现可扩展的图像压缩,以满足不同人类消费的要求;另一方面,我们可以通过解码的16位灰度分布配置,例如对象分类,检测和分割,直接在解码器侧直接实现多个机器视觉任务。实验结果表明,该建议的编解码器在PSNR和MS-SSIM方面实现了基于大多数基于学习的编解码器,并且优于传统编解码器(例如,BPG和JPEG2000)以进行图像重建。同时,它在对象检测和分割的映射方面优于现有的编解码器。
translated by 谷歌翻译
最近,深层回归森林(如深)差异模型(DDMS),最近已经广泛研究了面部年龄估计,头部姿势估计,凝视估计等问题。这些问题部分是挑战,因为没有噪声和偏差的大量有效培训数据通常不可用。虽然通过学习更具歧视特征或重新重量样本来实现的一些进展,但我们认为更可取的是逐渐学习以歧视人类。然后,我们诉诸自行节奏的学习(SPL)。但是,出现了自然问题:可以自花奏的政权引导DDMS实现更强大,偏差的解决方案吗? SPL的严重问题是通过这项工作首先讨论的,是倾向于加剧解决方案的偏差,特别是对于明显的不平衡数据。为此,本文提出了一种新的自定位范例,用于深鉴别模型,这根据与每个示例相关的产出似然和熵区分噪声和不足的例子,并从新的视角下解决SECT中的基本排名问题:公平性。此范例是根本的,可以轻松地与各种DDMS结合。在三个计算机视觉任务中进行了广泛的实验,例如面部年龄估计,头部姿态估计和凝视估计,证明了我们的范式的功效。据我们所知,我们的作品是SPL的文献中的第一篇论文,以为自我节奏政权建设的排名公平。
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
translated by 谷歌翻译
Seismic data often undergoes severe noise due to environmental factors, which seriously affects subsequent applications. Traditional hand-crafted denoisers such as filters and regularizations utilize interpretable domain knowledge to design generalizable denoising techniques, while their representation capacities may be inferior to deep learning denoisers, which can learn complex and representative denoising mappings from abundant training pairs. However, due to the scarcity of high-quality training pairs, deep learning denoisers may sustain some generalization issues over various scenarios. In this work, we propose a self-supervised method that combines the capacities of deep denoiser and the generalization abilities of hand-crafted regularization for seismic data random noise attenuation. Specifically, we leverage the Self2Self (S2S) learning framework with a trace-wise masking strategy for seismic data denoising by solely using the observed noisy data. Parallelly, we suggest the weighted total variation (WTV) to further capture the horizontal local smooth structure of seismic data. Our method, dubbed as S2S-WTV, enjoys both high representation abilities brought from the self-supervised deep network and good generalization abilities of the hand-crafted WTV regularizer and the self-supervised nature. Therefore, our method can more effectively and stably remove the random noise and preserve the details and edges of the clean signal. To tackle the S2S-WTV optimization model, we introduce an alternating direction multiplier method (ADMM)-based algorithm. Extensive experiments on synthetic and field noisy seismic data demonstrate the effectiveness of our method as compared with state-of-the-art traditional and deep learning-based seismic data denoising methods.
translated by 谷歌翻译